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The covariance graph (aka bi-directed graph) of a probability dis- 
tribution p is the undirected graph G where two nodes are adjacent 
iff their corresponding random variables are marginally dependent in 
p.* In this paper, we present a graphical criterion for reading depen- 
dencies from G, under the assumption that p satisfies the graphoid 
properties as well as weak transitivity and composition. We prove 
that the graphical criterion is sound and complete in certain sense. 
We argue that our assumptions are not too restrictive. For instance, 
all the regular Gaussian probability distributions satisfy them. 

1. Introduction. The covariance graph (aka bi-directed graph) of a 
probability distribution p is the undirected graph G where two nodes are 
adjacent iff their corresponding random variables are marginally dependent 
in p. Covariance graphs were introduced in (Cox and Wermuth, 1993) to 
represent independence models. Since then, they have received considerable 
attention. See, for instance, (Banerjee and Richardson, 2003; Chaudhuri 
et a!., 2007; Cox and Wermuth, 1996; Drton and Richardson, 2003, 2008; 
Kauermann, 1996; Lupparelli et al., 2009; Malouche and Rajaratnam, 2009; 
Pearl and Wermuth, 1994; Richardson, 2003; Wermuth, 1995; Wermuth and 
Cox, 1998; Wermuth et al., 2006; Wermuth, 2011, 2012). The works (Baner- 
jee and Richardson, 2003; Kauermann, 1996) are particularly important for 
the interpretation of covariance graphs in terms of independencies. Specifi- 
cally, these works introduce a graphical criterion for reading independencies 
from the covariance graph G of a probability distribution p, under the as- 
sumption that p satisfies the graphoid properties and composition. In this 
paper, we show that G can also be used to read dependencies holding in p. 

Keywords: chain graphs, concentration graphs, covariance graphs 

*It is worth mentioning that our definition of covariance graph is somewhat non- 
standard. The standard definition states that the lack of an edge between two nodes 
of G implies that their corresponding random variables are marginally independent in p. 
This difference in the definition is important in this paper. 
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Specifically, we present a graphical criterion for reading dependencies from 
G under the assumption that p satisfies the graphoid properties, weak tran- 
sitivity and composition. We also prove that our graphical criterion is sound 
and complete. Here, complete means that it is able to read all the depen- 
dencies in p that can be derived by applying the graphoid properties, weak 
transitivity and composition to the dependencies used in the construction 
of G and the independencies obtained from G. We also show that there ex- 
ist important families of probability distributions that satisfy the graphoid 
properties, weak transitivity and composition. These include, for instance, 
all the regular Gaussian probability distributions. 

Note that this paper would be unnecessary if p satifies all and only the 
independencies that can be read from G via the graphical criterion in (Baner- 
jee and Richardson, 2003; Kauermann, 1996), i.e. p is faithful to G. We will 
see that one cannot safely assume faithfulness in general. Therefore, one is 
only entitled to assume that p satifies all (but not necessarily only) the inde- 
pendencies that can be read from G via the graphical criterion in (Banerjee 
and Richardson, 2003; Kauermann, 1996), i.e. p is Markov wrt G. This is 
actually the reason of being of this paper. 

Two previous works that somehow address the problem of reading de- 
pendencies off covariance graphs are (Wermuth, 1995; Wermuth and Cox, 
1998). These works propose to determine whether two random variables 
Ua and Ub are dependent given some other random variables Uz by, first, 
constructing the covariance graph of the conditional probability distribu- 
tion given Uz of any set of random variables that includes Ua and Ub and, 
then, checking if the nodes corresponding to Ua and Ub are adjacent in 
the covariance graph constructed. Therefore, these works construct multi- 
ple covariance graphs, one for each conditional probability distribution of 
interest, from which only the dependencies used in their construction are 
read. The work presented in this paper is radically different: We only con- 
struct the covariance graph of the probability distribution at hand and read 
from it many more dependencies than those used in its construction. While 
this is the first work where a sound and complete graphical criterion for 
reading dependencies off covariance graphs is developed, it is worth men- 
tioning that there already exist sound and complete graphical criteria for 
reading dependencies off other graphical models. For instance, there exists 
a sound and complete graphical criterion for reading dependencies off the 
concentration graph (aka minimal undirected independence map or Markov 
network) of a probability distribution that satisfies the graphoid properties 
(Bouckaert, 1995), or the graphoid properties and weak transitivity (Peha 
et al., 2009). As a matter of fact, the graphical criterion that we present 
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in this paper is dual to the one in (Peha et al., 2009). There also exists 
a sound and complete graphical criterion for reading dependencies off the 
Bayesian network (aka minimal directed independence map) of a proba- 
bility distribution that satisfies the graphoid properties (Bouckaert, 1995), 
the graphoid properties and weak transitivity (Peha, 2010), or the graphoid 
properties and weak transitivity and composition (Peha, 2007). In the last 
two references, the Bayesian networks are restricted to be polytrees. Note 
that (Bouckaert, 1995; Peha, 2007, 2010; Peha et al, 2009) address a related 
but not more general problem than the one in this paper, since neither con- 
centration graphs nor Bayesian networks include covariance graphs. Related 
more general problems than the one studied in this paper have been recently 
addressed, though. For instance, a method to read dependencies from mul- 
tivariate regression graphs, which include covariance graphs, is proposed in 
(Wermuth, 2012). The author also presents necessary and sufficient condi- 
tions for the method to be sound. These conditions are the same as the ones 
considered in this paper, namely the graphoid properties plus weak tran- 
sitivity and composition. Unlike in this paper, no proof of completeness of 
the method proposed appears in (Wermuth, 2012). Another related more 
general work is (Wermuth, 2011), where the author shows how summary 
graphs, which include covariance graphs, can help to detect which depen- 
dencies remain undistorted and which do not after marginalization and/or 
conditioning in a probability distribution generated over a so-called par- 
ent graph. It should be pointed out that that the probability distribution 
is generated over a parent graph implies that it satisfies the same condi- 
tions as the ones considered in this paper (Wermuth, 2011, Proposition 3). 
Again, unlike in this paper, the completeness question is not addressed in 
(Wermuth, 2011). Finally, it should be noted that (Wermuth, 2011, 2012) 
make use of the graphical criterion presented in (Sadeghi and Lauritzen, 
2011) for reading independencies from loopless mixed graphs, which include 
multivariate regression graphs, summary graphs and parent graphs. This 
criterion is sound and complete in certain sense, given that the graphoid 
properties and composition hold (Sadeghi and Lauritzen, 2011, Theorem 3). 
These conditions are, in fact, not only sufficient but necessary too (Sadeghi 
and Lauritzen, 2011, Section 6.3). 

We think that the work presented in this paper can be of great interest 
for the artificial intelligence community. Graphs are one of the most com- 
monly used metaphors for representing knowledge because they appeal to 
human intuition (Pearl, 1988). Furthermore, graphs are parsimonious mod- 
els because they trade off accuracy for simplicity. Consider, for instance, 
representing the independence model induced by a probability distribution 
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as a graph. Though this graph is typically less accurate than the probabil- 
ity distribution (the graph may not represent all the (in) dependencies and 
those that are represented are not quantified), it also requires less space to 
be stored and less time to be communicated than the probability distribu- 
tion, which may be desirable features in some applications. Thus, it seems 
sensible developing tools for reasoning with graphs. Our graphical criterion 
is one such a tool: As the graphical criterion in (Banerjee and Richardson, 
2003; Kauermann, 1996) makes the discovery of independencies amenable to 
human reasoning by enabling to read independencies off a covariance graph 
G without numerical calculation, so does our graphical criterion with re- 
spect to the discovery of dependencies. There are fields where discovering 
dependencies is more important than discovering independencies (Wermuth, 
1995; Wermuth and Cox, 1998). It is in these fields where we believe that 
our graphical criterion has greater potential. In bioinformatics, for instance, 
the nodes of G may represent (the expression levels of) some genes under 
study. Bioinformaticians are typically more interested in discovering gene 
dependencies than independencies, because the former provide contexts in 
which the expression level of some genes is informative about that of some 
other genes, which may lead to hypothesize dependencies, functional rela- 
tions, causal relations, the effects of manipulation experiments, etc. See, for 
instance, (Butte and Kohane, 2000) for an application of covariance graphs 
to bioinformatics under the name of relevance networks. 

The rest of the paper is organized as follows. We start by reviewing some 
concepts in Section 2. We show in Section 3 that assuming the graphoid 
properties, weak transitivity and composition is not too restrictive. We prove 
in Section 4 that the existing graphical criterion for reading independencies 
from covariance graphs is complete in certain sense. This result, in addition 
to being important in its own, is important for reading as many dependencies 
as possible from covariance graphs. We introduce in Section 5 our graphical 
criterion for reading dependencies from covariance graphs and prove that it 
is sound and complete in certain sense. Finally, we close with some discussion 
in Section 6. 

2. Preliminaries. In this section, we introduce some concepts and re- 
sults that are used later in this paper. We first recall some results from 
graphical models. See, for instance, (Banerjee and Richardson, 2003; Kauer- 
mann, 1996; Lauritzen, 1996; Studeny, 2005) for further information. Let 
V = {1, . . . , N} be a finite set of size N. The elements of V are not distin- 
guished from singletons and the union of the sets I\ , . . . , I n C V is written as 
the juxtaposition I\. . .I n . We assume throughout the paper that the union 



READING DEPENDENCIES FROM COVARIANCE GRAPHS 5 



of sets precedes the set difference when evaluating an expression. Unless 
otherwise stated, all the graphs in this paper are defined over V. If a graph 
G contains an undirected (respectively directed) edge between two nodes v\ 
and 1/2 > then we say that v\ — V2 (respectively v\ — > V2) is in G. If v\ — > V2 
is in G then v\ is called a parent of V2 in G. Let Pclg(I) denote the set of 
parents in G of the nodes in / C V . A route from a node v\ to a node v n , 
denoted v\ : u n , in a graph G is a sequence of nodes v\,...,v n such that 
there exists an edge in G between Vi and Wj+i for all 1 < i < n. A path is 
a route v\ : v n in which the nodes vi, . . . ,v n are distinct. A route v± : v n is 
called undirected if v% — is in G for all 1 < i < n. A node v\ is an an- 
cestor of a node v n in G if there is a route v\ : v n in G such that Vi — Vi+± or 
Vi — > Vi+i is in G for all 1 < i < n. 1 Let Anc{I) denote the set of ancestors 
in G of the nodes in / C V . A node v n is a descendant of a node v\ in G if 
there is a route t> 1 : v n in 67 such that Vi — Vi+i or V{ —> t/j+i is in G for all 
1 < i < n and — > t> j+i is in G for some 1 < i < n. A chain graph (CG) is 
a graph (possibly) containing both undirected and directed edges and such 
that no node is a descendant of itself. An undirected graph (UG) is a CG 
containing only undirected edges. A directed and acyclic graph (DAG) is a 
CG containing only directed edges. A set of nodes of a CG is connected if 
there exists an undirected route in the CG between every pair of nodes in the 
set. A connectivity component of a CG is a connected set that is maximal 
with respect to set inclusion. The moral graph of a CG G, denoted G m , is 
the UG where two nodes are adjacent iff they are adjacent in G or they are 
both in Pac(Bi) for some connectivity component B{ of G. The subgraph 
of a CG G induced by / C V, denoted Gi, is the graph over / where two 
nodes are connected by a (un)directed edge if that edge is in G. Let X, Y 
and Z denote three disjoint subsets of V. We say that X is separated from 
Y given Z in a CG G if every path in (G An G (XY 'Z)) m from a node in A to a 
node in Y has some node in Z. We denote such a separation statement by 
se PG (X,Y\Z). 

Let U = {Ui)i & v denote a vector of random variables and Ui (I C V) its 
subvector (JJiji^i- We use upper-case letters to denote random variables and 
the same letters in lower-case to denote their states. Unless otherwise stated, 
all the probability distributions in this paper are defined over U. Let X, Y, 
Z and W denote four disjoint subsets of V. We represent by X _L p Y\Z 
that Ux is independent of Uy given Uz in a probability distribution p. 
We represent by X JL P Y\Z that X _L P Y\Z does not hold. A probability 
distribution p is a graphoid if it satisfies the following properties: Symmetry 



1 Note that our definition of ancestor follows (Lauritzen, 1996) and differs from others 
that exist in the literature, e.g. (Richardson and Spirtes, 2002). 
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X _L P Y\Z =>- Y _L p X\Z, decomposition X _L P YW\Z X _L P Y\Z, weak 
union X± p YW\Z X± P Y\ZW, contraction X ± P Y\ZW A X ± P W\Z 
X ± p YW\Z, and intersection X _L P Y\ZW All P W\ZY =>- X_LpFW|Z. 
We say that a graphoid p is a WTC graphoid if it satisfies the following 
two additional properties: Weak transitivity X _L p Y\Z Ml P Y\ZK 
X± p K\Z V K± P Y\Z with K G V \ XFZ, and composition X _L p Y|ZA 
XJ_ P 1U|Z X± P YW\Z. 

Let X, Y and Z denote three disjoint subsets of V. We denote by X _L qY\Z 
that a CG G represents that C/x is independent of Uy given We denote 
by X JL G Y\Z that X _L gY\Z does not hold. In this paper, we are inter- 
ested in the classic Lauritzen-Wermuth-Frydenberg interpretation of CGs as 
independence models, which is based on the following graphical criterion. 

Definition 2.1. Given a CG G, X± G Y\Z if sep G (X,Y\Z). 

However, in this paper we are also interested in the dual interpretation of 
UGs as independence models that builds on the following graphical criterion. 

Definition 2.2. Given an UG G, X± G Y\Z if sep G (X,Y\V \ XYZ). 

The following rephrasing of the graphical criterion in Definition 2.2 may 
be easier to recall: X J- G Y\Z if every path in G from a node in X to a node 
in Y has some node outside XYZ. When an UG is interpreted according to 
the graphical criterion in Definition 2.1 we call it a concentration graph, and 
when it is interpreted according to the graphical criterion in Definition 2.2 we 
call it a covariance graph. A probability distribution p is Markov wrt a CG, 
concentration graph or covariance graph G when X J- p Y\Z if X J_ G Y\Z for 
all X, Y and Z disjoint subsets of V. A probability distribution p is faithful 
to a CG, concentration graph or covariance graph G when X _L p Y\Z iff 
X _L G Y\Z for all X, Y and Z disjoint subsets of V. The concentration 
graph (aka minimal undirected independence map or Markov network) of 
a probability distribution p is the UG G where two nodes A and B are 
adjacent iff AjL p B\V \ AB. The covariance graph (aka bi-directed graph) 
of a probability distribution p is the UG G where two nodes A and B are 
adjacent iff AJL p B. A WTC graphoid p is Markov wrt both its covariance 
graph G and its concentration graph H. However, neither X JL G Y\Z nor 
X JLjjY\Z implies X JL p Y\Z, unless p is faithful to G or H. This is actually 
the reason of being of this paper. 

3. WTC Graphoids. This paper is devoted to the study of WTC 
graphoids. We show in this section that WTC graphoids are worth study- 
ing because they include important families of probability distributions. For 
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instance, any regular Gaussian probability distribution is a WTC graphoid 
(Studeny, 2005, Sections 2.2.2, 2.3.5 and 2.3.6). The following theorem in- 
troduces another interesting family of WTC graphoids. 

Theorem 3.1. Let G be a CG. Any probability distribution p that is 
faithful to G is a WTC graphoid. 

Proof. Let q be any regular Gaussian probability distribution that is 
faithful to G7. Such probability distributions exist due to (Peha, 2011, The- 
orems 1 and 2). Since p and q are faithful to G, X -L p Y\Z iff X ^qY\Z iff 
X -L q Y\Z for all X, Y and Z disjoint subsets of V. Therefore, p is a WTC 
graphoid because q is a WTC graphoid. □ 

The previous theorem is meaningful only if we prove that, for any CG, 
there exist probability distributions that are faithful to it. We do so in the 
following theorem. 

Theorem 3.2. Let G be a CG. If each random variable in U has a 
finite prescribed sample space with at least two possible states, then there 
exists a discrete probability distribution with the prescribed sample spaces 
for the random variables in U that is faithful to G. On the other hand, if the 
sample space of each random variable in U is M, then there exist a regular 
Gaussian probability distribution that is faithful to G and a continuous but 
non-Gaussian probability distribution that is faithful to G. 

Proof. The first and second statements in the theorem are proven in 
(Peha, 2009, Theorems 3 and 5) and (Peha, 2011, Theorems 1 and 2), re- 
spectively. The third statement in the theorem can easily be proven by using 
copulas (Nelsen, 2006) as follows. Let p denote any regular Gaussian prob- 
ability distribution that is faithful to G. Derive the Gaussian copula for p. 
The copula represents the independence model of p stripped from its uni- 
variate marginals. Therefore, the copula together with a set of arbitrary 
univariate marginals can be used to generate a multivariate probability dis- 
tribution whose independence model is the one dictated by the copula and 
whose univariate marginals are the given ones. The desired result is achieved 
if the arbitrary marginals are chosen so that they are continuous but non- 
Gaussian. See (Nelsen, 2006) for more details. □ 

It is worth mentioning that the results in (Peha, 2009, Theorems 3 and 
5) and (Peha, 2011, Theorems 1 and 2) are actually stronger than the first 
and second statements in the previous theorem. Specifically, the results re- 
ported there are that, in certain measure-theoretic sense, almost all the 
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discrete probability distributions and regular Gaussian probability distribu- 
tions that are Markov wrt a CG are faithful to it. Finally, note that the 
marginals and conditionals of a regular Gaussian probability distribution 
are regular Gaussian probability distributions and, thus, WTC graphoids. 
In fact, this property can be generalized to all the WTC graphoids. The 
following theorem, originally reported in (Peha et al., 2006, Theorem 5), 
formalizes this result. 

Theorem 3.3. Let p be a WTC graphoid and let I QV. Then, p(U v \i) 
is a WTC graphoid. If p{Uy\i\Ui = uj) has the same (in) dependencies for 
all ui, then p(Uy\j\Uj = uj) for any ui is a WTC graphoid. 

It is worth noting that many members of the families of WTC graphoids 
that we have presented in this section are Markov wrt their covariance graphs 
but not faithful to them. Hence, the need to develop a graphical criterion 
for reading dependencies from the covariance graph of a WTC graphoid. 
For example, consider any discrete, regular Gaussian, or continuous but 
non-Gaussian probability distribution p that is faithful to a CG with {A — > 
B,B —> C} as induced subgraph. Then, the covariance graph G of p has 
{A — B, A — C, B — C} as induced subgraph and, thus, p is not faithful to 
G since A JL gC\Z but A _L P C\Z for some Z C V. This example is based 
on (Drton and Richardson, 2003; Pearl and Wermuth, 1994). The interested 
reader is referred to these works for a characterization of the independence 
models that can be represented exactly by DAGs but not by covariance 
graphs. 

4. Reading Independencies. The graphical criterion in Definition 2.2 
is sound for reading independencies from the covariance graph G of a WTC 
graphoid p, that is, it only identifies independencies in p (Banerjee and 
Richardson, 2003; Kauermann, 1996, Proposition 2.2). In this section, we 
show that this graphical criterion is complete in the sense that it identifies 
all the independencies in p that can be identified by studying G alone. This 
completeness result, in addition to being important in its own, is crucial for 
reading as many dependencies as possible from G, as we will see in the next 
section. In order to prove the referred completeness result, it suffices to prove 
that there exist WTC graphoids that are faithful to G, because G is their 
covariance graph and they only have the independencies that the graphical 
criterion in Definition 2.2 identifies from G. Therefore, we cannot derive 
more independencies from G alone than those identified by this graphical 
criterion, because p may be one of the WTC graphoids that are faithful to 
G. The following two theorems prove the desired result. 
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Theorem 4.1. Let G be a covariance graph. If each random variable 
in U has a finite prescribed sample space with at least two possible states, 
then there exists a discrete probability distribution with the prescribed sample 
spaces for the random variables in U that is faithful to G. On the other 
hand, if the sample space of each random variable in U is M, then there 
exist a regular Gaussian probability distribution that is faithful to G and a 
continuous but non- Gaussian probability distribution that is faithful to G. 

Proof. We start by proving the first statement in the theorem. First, 
we create a DAG from G as follows: Replace each edge A — B in G with 
A -<— V^ B — >• B where V' A B is a newly created node. Call the resulting 
DAG H and let V' denote all the newly created nodes. It is easy to see that 
X± H Y\Z iff X± G Y\Z for all X, Y and Z disjoint subsets of V. 

Let U' = (U-)i£yi denote a vector of random variables such that each 
of them has any finite sample space with at least two possible states. Let 
p(U, U') denote any discrete probability distribution that is faithful to H. 
Such probability distributions exist by (Meek, 1995, Theorem 7). Note that, 
for any X, Y and Z disjoint subsets of V, X -L p nj\Y\Z iff X A_ p njjji\Y\Z iff 
X± H Y\Z iff X± G Y\Z. Consequently p(U) is faithful to G. 

The second statement in the theorem can be proven in much the same 
way as the first if p(U, V) denotes now any regular Gaussian probability 
distribution that is faithful to H. Such probability distributions exist by 
(Spirtes et al., 1993, Theorem 3.2). Note that p(U) is regular Gaussian. 

Finally, the third statement in the theorem can be proven by using copulas 
as we did in the proof of Theorem 3.2. □ 

An alternative proof of the second statement in the theorem above follows 
from (Richardson and Spirtes, 2002, Theorem 7.5). Specifically, (Richardson 
and Spirtes, 2002) introduces a new class of graphical models called ancestral 
graphs, whose edges can be undirected, directed or bi-directed (o). Covari- 
ance graphs are equivalent to ancestral graphs with only bi-directed edges. 
(Richardson and Spirtes, 2002, Theorem 7.5) proves that, for any ancestral 
graph, there is a regular Gaussian probability distribution that is faithful to 
it. In Appendix A, we strengthen the second statement in the theorem above 
by proving that, in certain measure-theoretic sense, almost all the regular 
Gaussian probability distributions that are Markov wrt a covariance graph 
are faithful to it. Although this result is not used in this paper, we consider 
it to be interesting in its own and, thus, we decide to report on it. 

Theorem 4.2. Let G be a covariance graph. Any probability distribution 
p that is faithful to G is a WTC graphoid. 
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Proof. Let q be any regular Gaussian probability distribution that is 
faithful to G. Such probability distributions exist due to Theorem 4.1. Since 
p and q are faithful to G, X ± P Y\Z iff X± G Y\Z iff X ± q Y\Z for all X, Y 
and Z disjoint subsets of V. Therefore, p is a WTC graphoid because q is a 
WTC graphoid. □ 

As explained at the beginning of this section, the previous two theorems 
imply that the graphical criterion in Definition 2.2 is complete for reading 
independencies from the covariance graph G of a WTC graphoid p, in the 
sense that it identifies all the independencies in p that can be identified by 
studying G alone. An equivalent formulation of this result is that the graph- 
ical criterion is complete in the sense that it identifies all the independencies 
that are shared by all the WTC graphoids whose covariance graph is G. 
Finally, it is worth mentioning that the graphical criterion in Definition 2.2 
is not complete in the more stringent sense of being able to identify all the 
independencies in p. Actually, no sound graphical criterion for reading inde- 
pendencies from G is complete in this latter sense. An example illustrating 
this follows. 

Example 4.1. Let p and p' denote two WTC graphoids that are faith- 
ful to the CGs in the left and center of Table 1, respectively. Such WTC 
graphoids exist by (Pena, 2011, Theorems 1 and 2). Note that A _L P C\B 
whereas AjL p /C\B. Let G and H denote the covariance and concentration 
graphs of p, respectively. Likewise, let G' and H' denote the covariance and 
concentration graphs of p' , respectively. Note that G, H, G' and H' are all 
the complete graph over {A, B,C, D}. Now, let us assume that we are deal- 
ing with p. Then, no sound graphical criterion entails A _L P C\B from G 
because this independence does not hold in p' , and it is impossible to know 
whether we are dealing with p or p' on the sole basis of G. 
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5. Reading Dependencies. In this section, we present the main con- 
tribution of this paper: We introduce a graphical criterion for reading de- 
pendencies from the covariance graph of a WTC graphoid and prove that 
it is sound and complete in certain sense. If G is the covariance graph 
of a WTC graphoid p then we know, by definition of G, that A JL P B 
for all the edges A — B in G. We call these dependencies the dependence 
base of p. Further dependencies in p can be derived from the dependence 
base via the WTC graphoid properties. For this purpose, we rephrase the 
WTC graphoid properties in their contrapositive form as follows. Symme- 
try Y JL P X\Z =>- X JL P Y\Z. Decomposition X / P Y\Z => X jL P YW\Z. 
Weak union X / P Y\ZW => XI P YW\Z. Contraction X / P YW\Z 
X JL P Y\ZW V X JL P W\Z is problematic for deriving new dependencies be- 
cause it contains a disjunction in the consequent and, thus, we split it into 
two properties: Contractionl X JL p YW\Z A X ± p Y\ZW => XJL p W\Z, and 
contraction X _)L p YW\Z A X ± P W\Z X JL p Y\ZW. Likewise, intersec- 
tion gives rise to intersectionl X JL p YW\Z A X ± p Y\ZW => X JL P W\ZY, 
and intersection X JL P YW\Z All P W\ZY => X JL p Y\ZW. Note that 
intersectionl and intersection are equivalent and, thus, we refer to them 
simply as intersection. Similarly, weak transitivity gives rise to weak tran- 
sitivityl X / p K\Z A K JL P Y\Z All P Y\Z ^ X JL P Y\ZK, and weak 
transitivity2 X JL P K\Z A K JL P Y\Z A X _L P Y\ZK => X JL P Y\Z. Fi- 
nally, composition X JL p YW\Z =>• X JL P Y\Z V X / P W\Z gives rise to 
compositionl X JL P YW\Z All P Y\Z X JL P W\Z, and composition 
X JL p YW\Z AX -L p W\Z X JL p Y\Z. Since compositionl and composition 
are equivalent, we refer to them simply as composition. The independence 
in the antecedent of any of the properties above holds if it can be read off 
G via the graphical criterion in Definition 2.2. This is the best solution we 
can hope for because, as discussed in the previous section, this graphical 
criterion is sound and complete for WTC graphoids. Moreover, this solution 
does not require more information than what it is available, namely G or 
equivalently the dependence base of p. We define the WTC graphoid closure 
of the dependence base of p as the set of dependencies that are in the de- 
pendence base of p plus those that can be derived from it by applying the 
nine properties above. 

Let X, Y and Z denote three disjoint subsets of V. We say that X is 
connected to Y given Z in an UG G if there exist two nodes A £ X and 
B G Y such that there exists a single path between A and B in G whose 
nodes are all outside XY Z\AB. We denote such a connection statement by 
cono(X,Y\Z). We denote by X^qY\Z that an UG G represents that Ux 
is dependent on Uy given Uz- We can now introduce our graphical criterion 
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for reading dependencies from the covariance graph of a WTC graphoid. 

Definition 5.1. Given the covariance graph G of a WTC graphoid, 
X~ G Y\Z ifcon G (X,Y\V\XYZ). 

The following rephrasing of the graphical criterion in the previous defini- 
tion may be easier to recall: X~qY\Z if there exist two nodes A £ X and 
B G Y such that there exists a single path between A and B in G whose 
nodes are all in ABZ. Interestingly, the graphical criterion in the previous 
definition is dual to the following graphical criterion, which we developed in 
(Pena et al., 2009) for reading dependencies from the concentration graph 
of a WTC graphoid. 

Definition 5.2. Given the concentration graph G of a WTC graphoid, 
X~ G Y\Zifcon G (X,Y\Z). 

We proved in (Pena et al., 2009, Theorems 5 and 6, Example 3) that 
the graphical criterion in Definition 5.2 is sound and complete in certain 
sense. We prove in the following two theorems that the graphical criterion 
in Definition 5.1 is also sound and complete in certain sense. 

Theorem 5.1. Let G be the covariance graph of a WTC graphoid p. If 
X ~ G Y\Z , then X JL p Y\Z is in the WTC graphoid closure of the dependence 
base of p. 

Proof. Let X~ G Y\Z hold due to a path A : B with Ae X and B £Y. 
We prove the theorem by induction over the length of A : B. We first prove 
it for length one. Let Z' denote the largest subset of Z such that there is a 
path in G from A to every node in Z' and all the nodes in these paths are in 
AZ' . Then, B -L G Z' because, otherwise, there would be two paths between A 
and B whose nodes are all in ABZ, which would contradict that X^ G Y\Z 
holds due to A : B. Thus, B L p Z'. Moreover, A JL p B because A and B 
are adjacent in G. Then, AZ' JL p B by symmetry and decomposition, which 
together with B -L P Z' imply AJL P B\Z' by symmetry and contraction2. Note 
that if Z' = 0, then AJL P B directly implies AJL p B\Z' . In any case, AjL p B\Z' 
implies AJL P B(Z \ Z')\Z' by decomposition. Now, note that AZ' ' ± G Z \ Z' 
by definition of Z' and thus AZ' _L p Z \ Z' , which implies A _L P Z \ Z'\Z' 
by symmetry and weak union, which together with AJL p B(Z\ Z')\Z' imply 
A_)L P B\Z by contraction. Note that if Z \ Z' = 0, then A_)L p B\Z' directly 
implies A JL P B\Z . In any case, A JL P B\Z implies X JL p Y\Z by symmetry 
and decomposition. 
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Assume as induction hypothesis that the theorem holds when the length 
of A : B is smaller than n. We now prove it for length n. Let C be any node 
in A : B except A and B. Note that C £ Z and thus A-LqB\Z \ C, which 
implies A-L p B\Z \ C. Moreover, note that A^ G C\Z \ C holds due to the 
subpath of A : B between A and C, which we denote as A : C. To see it, 
note that A : C is the only path between A and C in G whose nodes are all 
in AZ, because if there were two such paths then there would be two paths 
between A and B in G whose nodes are all in ABZ, which would contradict 
that X ~ gY\Z holds due to A : B. Likewise, C ~ gB\Z \ C. Moreover, 
A ~ G C\Z \ C and C ~ G B\Z \ C imply respectively A I P C\Z \ C and 
C JL p B\Z\C by the induction hypothesis, which together with A-L p B\Z\C 
imply AJL P B\Z by weak transitivityl, which implies X JL P Y\Z by symmetry 
and decomposition. 

Finally, note that the above derivation of X JL p Y\Z only makes use of the 
dependence base of p and the nine properties introduced at the beginning 
of this section. Thus, X JL P Y\Z is in the WTC graphoid closure of the 
dependence base of p. □ 

Note that we do not make use of the composition property in the proof 
above. However, we do use the fact that the graphical criterion in Defini- 
tion 2.2 is sound. The proof of this fact in (Banerjee and Richardson, 2003; 
Kauermann, 1996, Proposition 2.2) does make use of the composition prop- 
erty. 

Theorem 5.2. Let G be the covariance graph of a WTC graphoid p. If 
X JL p Y\Z is in the WTC graphoid closure of the dependence base of p, then 
X~ G Y\Z. 

Proof. Let H denote the concentration graph that has the same vertices 
and edges as G. In other words, G and H are the same UG but with different 
interpretations. Note that X ~ gY\Z iff X ~ /fl^V \ XYZ, which follows 
from the fact that X ~ G Y\Z iff con G (X,Y\V \ XYZ) iff con H (X,Y\V \ 
XYZ) iff X~ H Y\V \ XYZ. 

Clearly, all the dependencies in the dependence base of p are identified by 
the graphical criterion in Definition 5.1. Therefore, it only remains to prove 
that this graphical criterion satisfies the nine properties introduced at the 
beginning of this section. We do so below with the help of H. Note that the 
graphical criterion in Definition 5.2 applied to H satisfies the nine properties 
introduced at the beginning of this section (Peha et al., 2009, Theorem 6, 
Example 3). 

• Symmetry Y <^ G X\Z X ^ G Y\Z . 
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Trivial. 

Decomposition X~ G Y\Z X~ G YW\Z. 

X ~ qY\Z implies X ~ ^5^|V \ XYZ by definition, which implies 
X~ H YW\V \ XYZW by weak union, which implies X~ G YW\Z by 
definition. 

Weak union X~ G Y\ZW => X~ G YW\Z. 

X~ G Y\ZW implies X ~ H Y\V \ XYZW by definition, which implies 
X~ H YW\V \ XYZW by decomposition, which implies X~ G YW\Z 
by definition. 

Contractionl X~ G YW\Z A X ± G Y\ZW X~ G W\Z. 

X ~ G YW\Z and X 1 G Y \ ZW imply respectively X ~ H YW\V \ XYZW 

and X _L H Y\V\XYZW by definition, which imply X~ H W\V\ XZW 

by contraction2, which implies X ^ G W\Z by definition. 

Contraction X~ G YW\Z AX± G W\Z => X~ G Y\ZW. 

X~ G YW\Z and X± G W\Z imply respectively X~ H YW\V\XYZW 

and X ± H W\V\XZW by definition, which imply X~ H Y\V\XYZW 

by contractionl, which implies X ~ G Y\ZW by definition. 

Intersection X~ G YW|Z A X J- G Y\ZW X~ G iy|Zy. 

X ~ G F W| Z and X 1 G y | ZW imply respectively X ~ H YW\V \ XYZW 

and X _L hY\V\XY ZW by definition, which imply ~ hW\V \ XYZW 

by composition, which implies X ~ G W\ZY by definition. 

Weak transitivityl X~ G K\ZhK~ G Y\ZAX±GY\Z => X~ G Y\ZK. 

X~ G K\Z,K~ G Y\Z and X± G Y\Z imply respectively X ~ H K\ V \ XZK, 

K~ H Y\V\YZK and X± H Y\V\XYZ. Moreover, X~ H K\V \ XZK 

and K~ H Y\V \ YZK imply respectively X~ H YK\V \ XYZK and 

XK~#Y|y \ XYZK by symmetry and weak union, which together 

with X _L H Y\V \ XYZ imply respectively X~ H K\V \ XYZK and 

K ~ jjY\V \ XYZK by symmetry and contractionl, which together 

with X± H Y\V\ XYZ imply X~ G Y\V \ XYZK by weak transitiv- 

ity2, which implies X ~ G Y\ZK by definition. 

Weak tr ansitivity2 X~ G K\ZAK~ G Y\ZAX± G Y\ZK^X~ G Y\Z. 
Trivial because the antecedent involves a contradiction. To see it, note 
that X ~ G K\Z, K ~ gY\Z and X _L G Y\ZK imply respectively 
X~ H K\V \XZK, K~ H Y\V\YZK and X ± H Y\V\XYZK. More- 
over, X ~ H K\V \ XZK and IT ~ hY\V \ YZK imply respectively 
X~ H YK\V \ XYZK and XK ~ H Y\V \ XYZK by symmetry and 
weak union, which together with X _L #Y| V \ XYZK imply respec- 
tively X ~ \ XFZK and K ~ H Y\V \ XYZK by symmetry 
and composition, which together with X _L #Y|y \ XYZK imply a 
contradiction as shown in (Peha et al., 2009, Theorem 6). 
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• Composition X ~ G YW\ Z f\ X ^ G Y\Z => X ~ G W\Z . 

X~ G YW\Z and X± G Y\Z imply respectively X~ H YW\V\XYZW 
and XL H Y\V \ XYZ by definition, which imply X~ \ XZW 
by intersection, which implies X ~ G W\Z by definition. 

□ 

While Theorem 5.1 was somewhat expected because if there is a single 
path between A and B in G whose nodes are all in ABZ then there is 
no possibility of path cancellation, the combination of Theorems 5.1 and 
5.2 is rather exciting: We now have a simple graphical criterion to decide 
whether a given dependence is or is not in the WTC graphoid closure of the 
dependence base of p, i.e. we do not need to try to find a derivation of it, 
which is usually a tedious task. 

We devote the rest of this section to some observations that follow from 
the previous two theorems. A sensible question to ask is whether the graph- 
ical criterion in Definition 5.1 is complete in the sense of being able to iden- 
tify all the dependencies shared by all the WTC graphoids whose covariance 
graph is a given UG. The answer is no. An illustrative example follows. 

Example 5.1. Let G denote the UG in Table 1. Consider any WTC 
graphoid p whose covariance graph is G. Such WTC graphoids exist by 
Theorems 4-1 an d 4-%- Then, A JL P B\C or A JL P C\B because otherwise 
A _L p BC by intersection, which is a contradiction because A ~ G BC im- 
plies A JL p BC by Theorem 5.1. Assume A JL P B\C. Note that B ~ G D\C 
implies B JL P D\C by Theorem 5.1. Then, A JL P B\C and B JL P D\C to- 
gether with A-L P D\C, which follows from A _L G D\C , imply A JL P D\BC by 
weak transitivity 1. Likewise, A JL P E\BC when assuming A JL p C\B . Then, 
A JL p D\BC or AJL p E\BC, which imply A JL P DE\BC by decomposition. 
However, G DE\BC does not hold. 

Note that the fact that the graphical criterion Definition 5.1 is not com- 
plete in the latter sense implies that it is neither complete in the more 
stringent sense of being able to identify all the dependencies in the WTC 
graphoid at hand. Actually, no sound graphical criterion for reading depen- 
dencies from the covariance graph of a WTC graphoid can be complete in 
this more stringent sense. To see it, consider again Example 4.1. Let us as- 
sume that we are dealing with p' . Then, no sound graphical criterion entails 
AJL p /C\B from G' because this dependence does not hold in p, and it is 
impossible to know whether we are dealing with p or p' on the sole basis of 
G'. 
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It is worth mentioning that the graphical criteria in Definitions 5.1 and 
5.2 complement each other, as each of them can read dependencies than 
the other cannot. To see it, consider the WTC graphoid p in Example 4.1. 
Then, A^ G B and thus AJL p B by Theorem 5.1. However, this dependence 
cannot be derived from H because A ~ uB does not hold. On the other 
hand, A~ H B\CD and thus AJL p B\CD by (Peha et al., 2009, Theorem 5). 
However, this dependence cannot be derived from G because ^4 ~ qB\CD 
does not hold. 

Again, a sensible question to ask is whether the joint use of the graphical 
criteria in Definitions 5.1 and 5.2 is complete in the sense of being able 
to identify all the dependencies shared by all the WTC graphoids whose 
covariance and concentration graphs are two given UGs. The answer is no. 
An illustrative example follows. 

Example 5.2. Let G denote the UG in Table 1. Let H denote the com- 
plete graph over {A, B,C, D, E, F} . Consider any WTC graphoid p whose 
covariance and concentration graphs are G and H , respectively. Such WTC 
graphoids exist. To see it, it suffices to take any WTC graphoid that is faith- 
ful to G, which exists by Theorems 4-1 and 4-%- Recall that we have proven 
in Example 5.1 that A _L P DE\BC. However, neither A ~ qDE\BC nor 
A~ H DE\BC holds. 

Note that the fact that the joint use of the graphical criteria in Definitions 
5.1 and 5.2 is not complete in the latter sense implies that it is neither com- 
plete in the more stringent sense of being able to identify all the dependencies 
in the WTC graphoid at hand. Actually, no pair of sound graphical criteria 
for reading dependencies from the covariance and concentration graphs of a 
WTC graphoid can be complete in this more stringent sense. To see it, con- 
sider again Example 4.1. Let us assume that we are dealing with p . Then, 
no pair of sound graphical criteria entails AJL p iC\B from G' and H' because 
this dependence does not hold in p, and it is impossible to know whether we 
are dealing with p or p' on the sole basis of G' and H' . 

The following corollary extends to WTC graphoids a result originally 
proven in (Malouche and Rajaratnam, 2009, Theorem 3) for Gaussian proba- 
bility distributions. The extension is straightforward thanks to the graphical 
criterion in Definition 5.1. 

Corollary 5.1. Let G be the covariance graph of a WTC graphoid p. 
If G is a forest, then p is faithful to G. 

Proof. Assume to the contrary that p is not faithful to G. Since G is 
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the covariance graph of p, the previous assumption is equivalent to assume 
that there exist three disjoint subsets of V, here denoted X, Y and Z, such 
that XJLqY\Z but X _L p Y\Z. However, X JLqY\Z implies that there must 
exist a path in G between some node A £ X and some node B G Y whose 
nodes are all in ABZ. Furthermore, since G is a forest, that must be the 
only such path between A and B in G. However, this implies X~ g¥\Z and 
thus X JL p Y\Z by Theorem 5.1, which is a contradiction. □ 

Given the covariance graph G of a WTC graphoid p, X ^LqY\Z does not 
imply X JL p Y\Z. This is actually the reason of being of this paper. How- 
ever, if G is a forest, then the previous corollary proves that X JLqY\Z does 
imply X JL p Y\Z and, moreover, that this way of reading dependencies from 
G is complete in the strictest sense discussed above. The following corol- 
lary extends to WTC graphoids a result originally proven in (Malouche and 
Rajaratnam, 2009, Lemma 5) for Gaussian probability distributions. The 
extension is straightforward thanks to the graphical criteria in Definitions 
5.1 and 5.2. 

Corollary 5.2. Let G and H be, respectively, the covariance and con- 
centration graphs of a WTC graphoid p. Then, G and H have the same con- 
nected components. Moreover, if a connected component in G (respectively 
H) is a tree then the corresponding connected component in H ( respectively 
G) is the complete graph. 

Proof. First, we prove that G and H have the same connected compo- 
nents. If two nodes A and B belong to the same connected component in G, 
then A^ qB\Z for some Z C V \ AB and thus AJL P B\Z by Theorem 5.1. 
However, if A and B belong to different connected components in H, then 
A-LtfB\Z and thus A-L p B\Z, which is a contradiction. On the other hand, 
if two nodes A and B belong to the same connected component in H, then 
A~ H B\Z for some Z QV\AB wad. thus AJL P B\Z by (Peha et aL, 2009, 
Theorem 5). However, if A and B belong to different connected components 
in G, then A^qB\Z and thus A-L p B\Z, which is a contradiction. 

Now, take any connected component C in G that is a tree. We prove that 
the corresponding connected component T> in H is the complete graph. If 
two nodes A and B belong to C, then A~ G B\V\AB and thus AJL P B\V\AB 
by Theorem 5.1, which implies that A and B are adjacent in D. 

Finally, take any connected component T> in H that is a tree. We prove 
that the corresponding connected component C in G is the complete graph. 
If two nodes A and B belong to T>, then A~ nB and thus AjL p B by (Peha 
et al., 2009, Theorem 5), which implies that A and B are adjacent in C. □ 
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Note that the opposite of the second statement in the previous corollary 
is not true. The following example illustrates this. 

Example 5.3. Let G (respectively H) be the covariance (respectively 
concentration) graph of a WTC graphoid p. Assume that G (respectively H) 
is the complete graph and p is faithful to it. Such WTC graphoids exist by 
Theorems 4-1 an d 4-2 (respectively Theorems 3.1 and 3.2). Then, p has no 
independencies. Consequently, H (respectively G) is the complete graph and 
p is faithful to it. 

6. Discussion. In this paper, we have provided new insight into co- 
variance graphs by introducing a graphical criterion for reading dependen- 
cies from the covariance graph of a WTC graphoid. We have shown that 
WTC graphoids are not a too restrictive family of probability distributions 
by showing that it includes interesting discrete, Gaussian, and continuous 
but non-Gaussian probability distributions. We have proven that the new 
graphical criterion is sound and complete in certain sense. In order to prove 
these properties, we have had to prove first that the graphical criterion in 
(Banerjee and Richardson, 2003; Kauermann, 1996) for reading independen- 
cies from the covariance graph of a WTC graphoid is complete in certain 
sense. We have done so by proving that there are discrete, Gaussian, and 
continuous but non-Gaussian probability distributions that are faithful to 
any covariance graph. This result is also important because it implies that 
there exist probability distributions that covariance graphs can represent ex- 
actly but CGs cannot. Therefore, covariance graphs complement CGs. The 
following example illustrates this. 

Example 6.1. Consider the covariance graph G = {A — B, B — C,C — 
D,D — A}. Consider any CG H that represents the same independencies 
as G. Note that H must have some edge between A and B, B and C, C 
and D, and D and A. However, A and C cannot be adjacent in H because 
A _L qC . Likewise, B and D cannot be adjacent in H because B _L qD . 
Then, H = {A — > B,B <— C ', A —> D, D <— C} because otherwise either 
A± H C\B orA± H C\D orA± H C\BD, whereas A )L G C\B and A)L G C\D 
and A JL qC\BD. However, such an H implies B )L hD whereas B _L qD . 
Consequently, no CG can represent the same independencies as G. This 
implies that every probability distribution that is faithful to G (recall that 
such probability distributions exist by Theorem 3.2) is not faithful to any 
CG. 

It is fair mentioning that we are not the first to note that covariance 
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graphs complement other more popular graphical models. For instance, it 
follows from (Drton and Richardson, 2003; Pearl and Wermuth, 1994) that 
covariance graphs complement DAGs. Our example above simply extends 
this observation to CGs. 

Another consequence of the faithfulness result in Theorem 3.2 is that 
it proves wrong the misconception that covariance graphs are densely con- 
nected because their edges represent marginal dependencies. Specifically, the 
theorem implies that there are probability distributions that are faithful to 
any covariance graph, no matter its topology. 

Interestingly, the graphical criterion developed in this paper is dual to 
the one presented in (Peha et al., 2009) for reading dependencies from the 
concentration graph of a WTC graphoid. This duality resembles the duality 
existing between the graphical criteria for reading independencies from con- 
centration and covariance graphs (Banerjee and Richardson, 2003; Kauer- 
mann, 1996). We have also shown that the new graphical criterion and the 
one presented in (Peha et al., 2009) complement each other, as there may 
be dependencies that only one of them can identify. 

Finally, we have pointed out some limitations of the graphical criterion 
introduced in this paper that suggest future lines of research. For instance, it 
remains an open question whether it is possible to develop a similar graphical 
criterion that is complete in a stricter sense than the one used in this paper. 
It also remains an open question whether our faithfulness result in Appendix 
A for regular Gaussian probability distributions can be extended to discrete 
probability distributions with the help of the parameterizations in (Luppar- 
elli et al., 2009). Another line of action is the application of our graphical 
criterion in bioinformatics. In such an application, the covariance graph has 
to be learnt from gene expression data via, for instance, hypothesis tests. 
The data available for learning is typically scarce due to the high cost asso- 
ciated with its production. In this scenario, covariance graphs are easier to 
learn than Bayesian networks or concentration graphs, which are the graph- 
ical models commonly used in bioinformatics, because the former involves 
testing for marginal (in) dependencies whereas the latter involve testing for 
conditional (in) dependencies. We do not suggest with this that one should 
quit using Bayesian networks and concentration graphs in bioinformatics. 
Specifically, Bayesian networks have an important advantage over covariance 
graphs, namely that they can provide us with insight into the mechanistic or 
causal process underlying the data at hand. What we suggest is that when 
the data at hand is not considered enough to learn a reliable Bayesian net- 
work, one may be willing to learn a less informative but more reliable model 
such as a covariance graph, particularly now that we have graphical criteria 
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for reading both dependencies and independencies off it. 

Note that this paper only studies the structure of covariance graphs and, 
thus, it does not deal with their parameterization and/or parameter estima- 
tion. The interested reader is referred to (Chaudhuri et al., 2007; Drton and 
Richardson, 2003) for Gaussian models and (Drton and Richardson, 2008) 
for discrete models. However, it may be worth warning here that finding the 
maximum likelihood estimates of the parameters of a covariance graph can 
be hard, depending on the parameterization considered. This is particularly 
true for discrete models (Drton and Richardson, 2008). The reason is that 
a marginal independence can imply complicated parameter constraints in 
some parameterizations, because a marginal dependence is a property of the 
joint probability distribution rather than of the relevant marginal probabil- 
ity distribution. An early work that showed the latter is (Zentgraf, 1975), 
where the author gives an example of two-way interactions implying three- 
way interaction. In summary, learning the structure of a covariance graph 
may be simpler than learning the structure of a concentration graph. How- 
ever, estimating the parameters of the former may be harder than estimating 
the parameters of the latter. 

7. Appendix A. We strengthen the second statement in Theorem 4.1 
by proving that, in certain measure-theoretic sense, almost all the regular 
Gaussian probability distributions that are Markov wrt a covariance graph 
are faithful to it. Although this result is not used in this paper, we consider 
it to be interesting in its own and, thus, we decide to report on it. 

We start by recalling some results from matrix theory. See, for instance, 
(Horn and Johnson, 1985) for more information. Let M = (Mjj)ij e y denote 
a square matrix. Let Mjj with /, J C V denote its submatrix (Mjj^gT-jgj. 
The determinant of M can recursively be computed, for fixed i £ V, as 
det(M) = ^j &v {—l) l+ i Mijdet{M\(ifi), where denotes the matrix 

produced by removing the row i and column j from M. Note then that 
det(M) is a real polynomial in the entries of M. If det(M) ^ then the 
inverse of M can be computed as (M -1 )y = det(M\ (ji) )/det(M) for 

all i,j G V. We say that M is strictly diagonally dominant if abs(Mi t i) > 
J2{jeV ■ j^i} a bs(Mi,j) for all i 6 V, where abs() denotes absolute value. A 
matrix M is Hermitian if it is equal to the matrix resulting from, first, 
transposing M and, then, replacing each entry by its complex conjugate. 
Clearly, a real symmetric matrix is Hermitian. A real symmetric N x N 
matrix M is positive definite if x T Mx > for all non-zero x G H N . 

We continue by proving some auxiliary results. We assume hereinafter 
that the sample space of each random variable in U is M. Let M(G) denote 
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the set of regular Gaussian probability distributions p such that A _L P B for 
any two nodes A and B that are not adjacent in 67. Note that these are 
exactly the regular Gaussian probability distributions that are Markov wrt 
G (Banerjee and Richardson, 2003; Kauermann, 1996, Proposition 2.2). We 
parameterize each probability distribution p G AA(G) with its mean vector \i 
and covariance matrix £. Note that the values of some of these parameters 
are determined by the values of other parameters or by G. Specifically, the 
following constraints apply: 

CI. Ejj = £,-j for all i,j because £ is symmetric. 

C2. Sjj = for all i,j such that i and j are not adjacent in G. To see 
it, recall that if i and j are not adjacent in G then i _L p j and, thus, 
= (Studeny, 2005, Corollary 2.3). 

Hereinafter, the parameters whose values are not determined by the con- 
straints above are called non-determined (nd) parameters. However, the val- 
ues the nd parameters can take are further constrained by the fact that these 
values must correspond to some probability distribution in M(G). In other 
words, the values the nd parameters can take are constrained by the fact 
that £ must be positive definite. That is why the set of nd parameter values 
satisfying this requirement are hereinafter called the nd parameter space 
for Af(G). We do not work out the inequalities defining the nd parameter 
space because they are irrelevant for our purpose. The number of nd param- 
eters is what we call the dimension of G, and we denote it as d. Specifically, 
d = 2\V\ + \G\ where |^| and |G| are, respectively, the number of nodes and 
edges in G: 

• \V\ due to \x. 

• \V\ due to entries in the diagonal of £. 

• \G\ due to the entries below the diagonal of £ that are not identically 
zero. To see this, recall from the constraint C2 that there is one entry 
below the diagonal of £ that is not identically zero for each undirected 
edge in G. 

Lemma 7.1. Let G be a covariance graph of dimension d. The nd pa- 
rameter space for Af(G) has positive Lebesgue measure wrt ~ML d . 

Proof. Since we do not know a closed- form expression of the nd param- 
eter space for J\f(G), we take an indirect approach to prove the result. Recall 
that, by definition, the nd parameter space for M(G) is the set of real values 
such that, after the extension determined by the constraints CI and C2, £ 
is positive definite. Therefore, the nd parameters \x can take values indepen- 
dently of the nd parameters in £. However, the nd parameters in £ cannot 
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take values independently one of another because, otherwise, E may not be 
positive definite. However, if the entries in the diagonal of E take values in 
(\V\ — l,oo) and the rest of the nd parameters in E take values in [—1, 1], 
then the nd parameters in E can take values independently one of another. 
To see it, note that in this case E will always be Hermitian, strictly diag- 
onally dominant, and with strictly positive diagonal entries, which implies 
that E will always be positive definite (Horn and Johnson, 1985, Corollary 
7.2.3). 

The subset of the nd parameter space of M(G) described in the paragraph 
above has positive volume in M rf and, thus, it has positive Lebesgue mea- 
sure wrt W 1 . Then, the nd parameter space of M(G) has positive Lebesgue 
measure wrt JR d . □ 

Lemma 7.2. Let G be a covariance graph. For every i,j E V and K C 
V \ if, there exists a real polynomial S(i,j,K) in the nd parameters in the 
parameterization of the probability distributions inJ\f(G) such that, for every 
p € N{G), i-L p j\K iff S(i,j, K) vanishes for the nd parameter values coding 
p. 

Proof. Let E denote the covariance matrix of p. Note that i _L p j\K 
iff ((E 

ijK,ijK) )i,j — (Lauritzen, 1996, Proposition 5.2). Recall that 
(i>-ijK.ijK) : = {-l) ai ' : >det(Y 1 i K j K )/det(Y 1 ij K>ijK ) with a itj £ {0,1}. 
Moreover, note that det(YiijK,ijK) > because E is positive definite (Stu- 
deny, 2005, p. 237). Then, i± p j\K iff det(T,iKjK) = 0. Moreover, as noted 
in Section 2, det(Y,iKjK) is a real polynomial in the entries of E. Thus, 
i-L p j\K iff a real polynomial R(i,j,K) in the entries of E vanishes. Recall 
that each entry of E that is not identically zero corresponds to one of the 
nd parameters in the parameterization of the probability distributions in 
J\f(G). Therefore, R(i,j, K) can be expressed as a real polynomial S(i,j, K) 
in the nd parameters. Therefore, i-L p j\K iff S(i,j,K) vanishes for the nd 
parameter values coding p. □ 

We interpret the polynomial in the previous lemma as a real function on a 
real Euclidean space that includes the nd parameter space for J\f(G). We say 
that the polynomial in the previous lemma is non-trivial if not all the values 
of the nd parameters are solutions to the polynomial. This is equivalent to 
the requirement that the polynomial is not identically zero. 

Lemma 7.3. Let G be a covariance graph of dimension d. The subset of 
the nd parameter space for M{G) that corresponds to the probability distri- 
butions in M{G) that are not faithful to G has zero Lebesgue measure wrt 
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R d . 

Proof. Recall that M(G) contains exactly the regular Gaussian distri- 
butions that are Markov wrt G. Therefore, for any probability distribution 
p G AA(G7) not to be faithful to G, p must satisfy some independence that 
is not entailed by G. That is, there must exist three disjoint subsets of V, 
here denoted as I, J and K, such that IJLgJ\K but IA. p J\K. However, if 
IJLqJ\K then iJLaj\K for some i G I and j G J. Furthermore, if I -L p J\K 
then i-L p j\K by symmetry and decomposition. By Lemma 7.2, there exists 
a real polynomial S(i,j, K) in the nd parameters in the parameterization of 
the probability distributions in M(G) such that, for every q G Af(G), i-L q j\K 
iff S(i,j,K) vanishes for the nd parameter values coding q. Furthermore, 
S(i,j,K) is non-trivial (Kauermann, 1996, Theorem 3.1). Let sol(i, j, K) 
denote the set of solutions to the polynomial S(i,j,K). Then, sol(i, j, K) 
has zero Lebesgue measure wrt R d because it consists of the solutions to a 
non-trivial real polynomial in real variables (the nd parameters) (Okamoto, 
1973). Then, sol = U{/,j„kcv disjoint : i^J\k} U<;„ /..,, / : ; J( , ; i<\ sol(i,j,K) has 
zero Lebesgue measure wrt because the finite union of sets of zero 
Lebesgue measure has zero Lebesgue measure too. Consequently, the subset 
of the nd parameter space for M(G) that corresponds to the probability 
distributions in JV(G) that are not faithful to G has zero Lebesgue measure 
wrt R d because it is contained in sol. □ 

In summary, it follows from Lemmas 7.1 and 7.3 that, in the measure- 
theoretic described, almost all the elements of the nd parameter space for 
M(G) correspond to probability distributions in Af(G) that are faithful to 
G. Since this correspondence is clearly one-to-one, it follows that almost all 
the regular Gaussian distributions in JV(G) are faithful to G. We think that 
this result can easily be extended to strictly positive discrete probability 
distributions with the help of the parameterizations proposed in (Lupparelli 
et al., 2009). We do not elaborate further on this issue in this paper though. 

A word of caution is due at this point. It may be tempting to infer from 
the measure-theoretic results above that every regular Gaussian probability 
distribution p one encounters in reality is almost surely faithful to its covari- 
ance graph G. This may lead one to conclude that our graphical criterion 
for reading from G dependencies holding in p is not needed, since X JLqY\Z 
almost surely implies X jL p Y\Z. This argument is valid if p has been drawn 
from M(G) at random. However, we believe that p is more likely to have 
been carefully engineered (e.g. by natural evolution in the case of the gene 
networks mentioned in Section 1) than to have been drawn at random. Con- 
sequently, and despite the measure-theoretic results above, we think that 
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one cannot safely assume that p is almost surely faithful to G. Hence, the 
need of the graphical criterion proposed in this paper. 
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